Midterm exams

This is a "closed book" examination - in particular, you are not to use any resources outside of this notebook (except possibly pen and paper). You may consult help from within the notebook using ? but not any online references. You should turn wireless off or set your laptop in "Airplane" mode prior to taking the exam.

You have 2 hours to complete the exam.



In [38]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

Q1 (10 points).

Given the 2 matrices

A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

Perform matrix multiplication of A and B using the following methods:

Using nested for loops without the dot function (4 points)
Using numpy (2 points)
Using R (start the first line of a new cell with %%R). You should pass in the A and B matrices defined in Python for full marks, but partial credit will be given if you redefine them in R (4 points)



In [49]:

    
import numpy as np



In [40]:

    
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])



In [41]:

    
m, n = A.shape
n, p = B.shape
C = np.zeros((m, p))
for i in range(m):
    for j in range(p):
        for k in range(n):
            C[i,j] += A[i,k] * B[k, j]
C









    Out[41]:





array([[  38.,   44.,   50.,   56.],
       [  83.,   98.,  113.,  128.]])



In [42]:

    
A @ B









    Out[42]:





array([[ 38,  44,  50,  56],
       [ 83,  98, 113, 128]])



In [51]:

    
%load_ext rpy2.ipython









    



The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython



In [43]:

    
%R -iA,B A %*% B









    Out[43]:





array([[  38.,   44.,   50.,   56.],
       [  83.,   98.,  113.,  128.]])



In [44]:

    
%R -o iris

Q2 (10 points)

Read the data/iris.csv data set into a Pandas DataFrame, and answer the following questions:

Find the mean, min and max values of all four measurements (sepal.length, sepal.width, petal.length, petal.width) for each species
Find the average values of each measurement for rows where the petal.length is less than the sepal.width`



In [50]:

    
import pandas as pd



In [47]:

    
df = pd.read_csv('data/iris.csv')
df.groupby('Species').agg(['mean', 'min', 'max'])









    Out[47]:






  
    
      
      Sepal.Length
      Sepal.Width
      Petal.Length
      Petal.Width
    
    
      
      mean
      min
      max
      mean
      min
      max
      mean
      min
      max
      mean
      min
      max
    
    
      Species
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      setosa
      5.006
      4.3
      5.8
      3.428
      2.3
      4.4
      1.462
      1.0
      1.9
      0.246
      0.1
      0.6
    
    
      versicolor
      5.936
      4.9
      7.0
      2.770
      2.0
      3.4
      4.260
      3.0
      5.1
      1.326
      1.0
      1.8
    
    
      virginica
      6.588
      4.9
      7.9
      2.974
      2.2
      3.8
      5.552
      4.5
      6.9
      2.026
      1.4
      2.5



In [48]:

    
df[df['Petal.Length'] < df['Sepal.Width']].mean()









    Out[48]:





Sepal.Length    5.006
Sepal.Width     3.428
Petal.Length    1.462
Petal.Width     0.246
dtype: float64

Q3 (10 points)

Find the longest sequence of repeated letters (e.g. 'AAA') in the string below. Print 1) the length, 2) the index of the starting location, 3) the actual sequence. If there are ties, print the last sequence found. You can assume that only the letters A, C, T and G are found in the string.

TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC



In [52]:

    
import re



In [53]:

    
s = "TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC"



In [97]:

    
current = s[0]
n = 0
idx = None

current = s[0]
count = 1
for i, ch in enumerate(s[1:], 1):
    if ch == current:
        count += 1
    else:
        if count >= n:
            n = count
            idx = i
        count = 1
        current = ch
        
idx -= n           
print(n, idx, s[idx:(idx+n)])









    



6 323 AAAAAA



In [90]:

    
n = 0
idx = None

for m in re.finditer(r'(.)(\1+)', s):
    x = m.group(2)
    if len(x) > n:
        n = len(x)
        idx = m.start()

n += 1
print(n, idx, s[idx:(idx+n)])









    



6 323 AAAAAA



In [88]:

    
n = 0
idx = None

for m in re.finditer(r'(A+|C+|T+|G+)', s):
    x = m.group(1)
    if len(x) > n:
        n = len(x)
        idx = m.start()

print(n, idx, s[idx:(idx+n)])









    



6 323 AAAAAA

Q4 (10 points)

Euclid's algorithm for finding the greatest common divisor of two numbers is

gcd(a, 0) = a
gcd(a, b) = gcd(b, a modulo b)

Write a function to find the greatest common divisor in Python (4 poinst)
What is the greatest common divisor of 17384 and 1928? (1 point)
Write a function to calculate the least common multiple (4 points)
What is the least common multiple of 17384 and 1928? (1 point)

Note:

The greatest common divisor of two or more integers is the largest positive integer that is a divisor of both numbers
The least common multiple of two numbers is the smallest number (not zero) that is a multiple of both.



In [98]:

    
def gcd(a, b):
    if b == 0:
        return a
    else:
        return gcd(b, a % b)



In [99]:

    
gcd(17384, 1928)









    Out[99]:





8



In [104]:

    
def lcm(a, b):
    return (a*b) // gcd(a, b)



In [105]:

    
lcm(17384, 1928)









    Out[105]:





4189544

Q5 (10 points)

Write a function to flatten a list of lists using

For loops (2 points)
List comprehensions (4 points)
The reduce higher-order function (4 points)

For example,

flatten([[1,2], [3,4,5],[6,7,8,9]])

should return

[1,2,3,4,5,6,7,8,9]



In [106]:

    
def flatten1(list_of_lists):
    xs = []
    for alist in list_of_lists:
        for item in alist:
            xs.append(item)
    return xs



In [111]:

    
def flatten2(list_of_lists):
    return [item for alist in list_of_lists for item in alist]



In [133]:

    
from functools import reduce



In [135]:

    
def flatten3(list_of_lists):
    return list(reduce(lambda a, b: a + b, list_of_lists, []))



In [ ]:

    
xs = [[1,2], [3,4,5],[6,7,8,9]]



In [124]:

    
flatten1(xs)









    Out[124]:





[1, 2, 3, 4, 5, 6, 7, 8, 9]



In [125]:

    
flatten2(xs)









    Out[125]:





[1, 2, 3, 4, 5, 6, 7, 8, 9]



In [136]:

    
flatten3(xs)









    Out[136]:





[1, 2, 3, 4, 5, 6, 7, 8, 9]

	Sepal.Length			Sepal.Width			Petal.Length			Petal.Width
	mean	min	max	mean	min	max	mean	min	max	mean	min	max
Species
setosa	5.006	4.3	5.8	3.428	2.3	4.4	1.462	1.0	1.9	0.246	0.1	0.6
versicolor	5.936	4.9	7.0	2.770	2.0	3.4	4.260	3.0	5.1	1.326	1.0	1.8
virginica	6.588	4.9	7.9	2.974	2.2	3.8	5.552	4.5	6.9	2.026	1.4	2.5